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Abstract 

The robustness to the prior of Bayesian inference procedures based on a 
measure of statistical evidence are considered. These inferences are shown 
to have optimal properties with respect to robustness. Furthermore, a 
connection between robustness and prior-data conflict is established. In 
particular, the inferences are shown to be effectively robust when the 
choice of prior does not lead to prior-data conflict. When there is prior- 
data conflict, however, robustness may fail to hold. 


1 Introduction 

Robustness to the choice of the prior is an issue of considerable importance in a 
Bayesian statistical analysis. If an inference is very sensitive to the choice of the 
prior, then this could be viewed as either a negative for the inference method 
being used or for the choice of prior. In this paper it is shown that certain 
inferences are in a sense optimally robust to the choice of the prior. Furthermore, 
when the sensitivity of the inferences to the prior is measured quantitatively, it 
is shown that there is an intimate connection between the effective robustness 
of the inferences and whether or not there is prior-data conflict. So by choice of 
the inferential methodology and the avoidance of prior-data conflict, robustness 
of the inferences to the choice of prior is achieved. 

The basic ingredients for a statistical analysis are taken here to be the data 
X, a statistical model {fg : 6 £ 0}, where each fg is a probability density with 
respect to volume measure ^ on the sample space X, and a proper prior density tt 
with respect to volume measure ^ on 0. Note that volume measure on a discrete 
set is taken to be counting measure. Furthermore, suppose that interest is in 
making inferences about the quantity b = ^(^) where dt : 0 dt is onto and 
we don’t distinguish between the function and its range to save notation. 

Let 7rif(-|a;) and tt^ denote the posterior and prior densities of where 
these are both taken with respect to support measure on dt. It follows that, 
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under smoothness assumptions, 'iTqf{ip) = 7r(6*) J 5 f( 0 ) where 

J<s,{0) = (det((i5'(0)o((i'I'(0))‘)“i/^, (id> is the differential of and is vol¬ 
ume measure on Also, 7r(0 | cc) Jip(0) 

where 7r(0 | x) = TT{9)fg{x)/m{x), with m{x) = jQTT{0)fg{x)v{d9), is the pos¬ 
terior density of 9 with respect to v. Note that m is the prior predictive den¬ 
sity of the data with respect to /r. The conditional prior of 9 given ip = (6) 

has density 7r(0 | ^) = tt{9)J^{9)/ttii,{iP) with respect to on the set 

The conditional prior predictive density of x is then given by m{x \ ip) = 
I ^ simple argument, see Baskurt and Evans 

(2013), gives the Savage-Dickey ratio result that 

Tr^{ip\x) ^ m{x\ip) 

T:<s,{ip) m{x) ’ 


which has some use in the developments here. 

Robustness to the prior has been considered by many authors and there are a 
number of different approaches. Many discussions are concerned with determin¬ 
ing the range of values that some characteristic of interest takes when the prior 
is allowed to vary over some class. Berger (1990, 1994) contain broad reviews 
of work on this topic and Rios Insua and Ruggeri (2000) is a collection of pa¬ 
pers by key contributors. Dey and Birmiwal (1994) considers global robustness 
measures based upon measures of distance from the posterior distribution. 

The approach taken here is to study robustness to the prior for relative belief 
inferences for ip rather than all possible inferences. Relative belief inferences are 
based on the relative belief ratio defined 


RB\i,{ip I x) = lim 
5—s-o 


11^ {Ns (ip) |x) 
(Ns (ip)) 


( 2 ) 


whenever this limit exists for a sequence of neighborhoods Ns (ip) of ip con¬ 
verging nicely to ip (see Rudin (1974) for the definition of ’converging nicely’). 
Under mild regularity conditions the limit exists and is given by RB^(ip \ x) = 
T:^(ip I x)/tt^!(iP). Since RBqf(ip \ x) measures the change in belief that ip is the 
true value it is a measure of evidence. Here RB<i,(ip | x) > 1 means that there 
is evidence in favor of ip being the true value, as belief in ip has increased after 
seeing the data, and RB^{ip | a;) < 1 means that there is evidence against ip be¬ 
ing the true value, as belief in ip has decreased after seeing the data. Section 2 
provides some more details concerning relative belief inferences for both estima¬ 
tion and hypothesis assessment but also see Baskurt and Evans (2013). Results 
in Section 3 establish that these inferences have optimal robustness properties 
when the marginal prior for ip is allowed to vary over all possibilities in the class 
of e-contaminated priors. This generalizes results found in Wasserman (1989), 
Ruggeri and Wasserman (1993) and de la Horra and Eernandez (1994). Eur- 
thermore, an ambiguity concerning the interpretation of the results is resolved. 
As such this provides further justifications for these inferences. 

While inferences may be optimally robust, this does not imply that they are 
in fact robust. In Section 4 quantitative measures of the sensitivity of relative 
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belief inferences to both the marginal prior of if) and the conditional prior for 9 
given 'I'(0) = Ip are derived. In Section 5 it is shown that these inferences are in¬ 
deed robust when the base prior tt does not suffer from prior-data conflict. This 
adds weight to arguments concerning the importance of checking for prior-data 
conflict before reporting inferences, as prior-data conflict can imply sensitivity 
of the inferences to the choice of the prior. Prior-data conflict is interpreted 
as the true value lying in the tails of the prior and consistent methods have 
been developed for assessing this in Evans and Moshonov (2006) and Evans and 
Jang (2011a). Methodology for modifying a prior when prior-data conflict is 
encountered, through the selection of a prior weakly informative with respect 
to the base prior, is developed in Evans and Jang (2011b). 

2 Relative Belief Inferences 

When RBq, {ip\x) > 1 this is the factor by which prior belief in the truth of pj has 
increased after seeing the data. Clearly the bigger RB^], {tp \ x) the more evidence 
there is in favor of ip while, when RB^{ip | x) < 1, the smaller RB^{ip \ x) is the 
more evidence there is against ip. This leads to a total preference ordering on 
Ji', namely, ipi is not preferred to ip2 whenever RB^pipi \ x) < RBq,{ip2 \ x) since 
there is at least as much evidence for ip2 as there is for ipi. This in turn leads 
to unambiguous solutions to inference problems. 

The best estimate of ip is the value for which the evidence is greatest, namely, 

ip{x) = a.TgswpRB^,{ip\x). 

Associated with this estimate is a 7 -relative belief credible region Cii,^j{x) = {ip : 
RB^,{ip I x) > c^^.^{x)} where c^^.y(x) = inf{fc : IV^,{RB^,{ip \ x) < k\x) > 1— 7}. 
Notice that ip{x) € Cii,^.y{x) for every 7 e [0,1] and so, for selected 7 , the 
size of C<i,^.y{x) can be taken as a measure of the accuracy of the estimate 
ip{x). The interpretation of RB^{ip\x) as the evidence for ip, forces the use 
of the sets C<s,^.y{x) for our credible regions. For if ipi is in such a region and 
RB^{ip 2 I a;) > RB^,{ipi jx), then ip 2 must be in the region as well as there 
is at least as much evidence for ip2 as for ipi. Optimal properties for relative 
belief credible regions, in the class of all credible regions, have been established 
in Evans, Guttman and Swartz (2006) and Evans and Shakhatreh (2008) and 
optimal properties for ip{x) are established in Evans and Jang (2011c). 

For the assessment of the hypothesis Hq : ^'( 0 ) = ipo, the evidence is given 
by RB^{ipQ I x). One problem that both the relative belief ratio and the Bayes 
factor share as measures of evidence, is that it is not clear how they should be 
calibrated. Certainly the bigger RB^(ipQ | a;) is than 1, the more evidence we 
have in favor of ipQ while the smaller RBq,(ipQ \ x) is than 1 , the more evidence we 
have against ipQ. But what exactly does a value of RB<s,{ipQ | a;) = 20 mean? It 
would appear to be strong evidence in favor of ipo because beliefs have increased 
by a factor of 20 after seeing the data. But what if other values of ip had even 
larger increases? For example, the discussion in Baskurt and Evans (2013) of 
the Jeffreys-Lindley paradox makes it clear that the value of a relative belief 
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ratio or a Bayes factor cannot always be interpreted as an indication of the 
strength of the evidence. 

The value RB^, {ipo \ x) can be calibrated by comparing it to the other pos¬ 
sible values RB^i,(-\x) through its posterior distribution. For example, one 
possible measure of the strength is 

Bsi,{RB^l,{ip I x) < RBi^{ipQ I x) I x) (3) 

which is the posterior probability that the true value of ip has a relative belief 
ratio no greater than that of the hypothesized value ipQ. While (| 31 ) may look 
like a p-value, it has a very different interpretation. For when RB^{ipQ | x) < 1, 
so there is evidence against ip^^ then a small value for indicates a large 
posterior probability that the true value has a relative belief ratio greater than 
RB^{ipQ |x) and so there is strong evidence against ipo- If RB^{ipo | x) > 1, 
so there is evidence in favor of ipo, then a large value for ([3]) indicates a small 
posterior probability that the true value has a relative belief ratio greater than 
RB^,{ipo I x) and so there is strong evidence in favor of ipo- Notice that, in the 
set {ip ■- RB^l,{ 1 p\x) < RBii,{ipo\x)}, the “best” estimate of the true value is 
given by ipo simply because the evidence for this value is the largest in this set. 

Various results have been established in Baskurt and Evans (2103) sup¬ 
porting both RBi[,{ipo |x), as the measure of the evidence for Hq, and ([3]), as 
a measure of the strength of that evidence. For example, the following sim¬ 
ple inequalities are useful in assessing the strength of the evidence, namely, 

{RBii, (V' I a:) = RBqi {ipo | x) | x) < {RB^ { 4 ’\x) < RBq, {ipo | x) | x) < 
RB,i,{ipo I x). So if RB<i,{ipo | x) > 1 and Il^{{RB^{ipo \ x)} | x) is large, there is 
strong evidence in favor of ipo while, if RBis,{ipo | x) < 1 is very small, then there 
is immediately strong evidence against ipo- Also, in situations where there are 
only a few possible values of ip, then Ilq,{RB^{ip \ x) = RBq,{ipo | ic) | a;) can be 
a more appropriate measure of strength. 

When interest is in making inferences about ip = '^{0), it is reasonable to 
ask how sensitive the relief belief approach is to the ingredients given by the 
prior. This entails examining how dependent ip{x),Cqi^.y{x), RB^{ipo\x) and 
U^,{RB^{ip\x) < RB^{ipo I a;) I x) are to changes in the prior, as these four 
objects represent the essential relative belief inferences. 

The full prior tt for 9 can always be factored as Tr{9) = n^{ip)Tr{9 \ip)- In 
contrast to other discussions of robustness with respect to the prior, the sensi¬ 
tivity of the inferences to and the sensitivity of the inferences to 7r(-1 ip) are 
considered separately, as this leads to more information concerning where the 
lack of robustness arises when this occurs. 


3 Optimal Robustness With Respect to the Marginal 
Prior 

The result © implies that RB^,{ip \ x) = m{x \ ip)/m{x)- From this it is imme¬ 
diate that ip{x) = argsup^ RB,i,{ip \ x) = argsup^ m{x \ ip) and so the relative 
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belief estimate is optimally robust to as the estimate has no dependence on 
the marginal prior. Furthermore, is of the form {ip : m{x \ '>p) > k{ for 

some k and so the form of relative belief regions for ip is optimally robust to . 
The specific region chosen for the assessment of the accuracy of ipix) depends 
on the posterior and so is not independent of It is now proved that C^^^{x) 
has an optimal robustness property among all credible regions for ip. 

Consider e-contaminated priors for 6 of the form 

n, = n(-|V') X [(l-e)n^ + eQ], (4) 

where Q is a probability measure on dt and 11 is the base prior as described in 
the Introduction. Note that the conditional prior of 9 given 'I'(0) = ip is fixed 
and independent of e. 

To assess the robustness of the posterior content of a set C dt it makes 
sense to look at 5(A) = (A | x) — 11^“’®’' (A | x) where {^\x) = 

supQ (A I x) and 11^“'®’’ (A I x) = infg (A I x) and the supremum/infimum 
is taken over all probability measures on 4*. For this let e* = e/(l —e) and r(A) = 
sup^g^i?B^(^ I x) = sup^g^m(x I V’)/w(x), so r(4') = RB^{iPlrse{x)\x) 
and always one and only one of r(A),r(A®) equals r(4'). 

The following result is needed and a proof is provided in the Appendix. 

Lemma 1 (Huber (1973)) Let Q denote a probability measure on dt. For prior 
measure n|, = (1 —e)n^+e(5 on 4* and A C 4/, (i) (A | x) = (11^ (A | x) + 

e*r(A))/(l + eV(A)), (it) n;^™®" (A | x) = (A | x) /(I + e*r(A®)), (iti) 

^ n,;, (A|x) e*(r(A®) -r(A)) e*r{A) 

^ ’ (l + e*r(A))(l + eV(A®)) (1 + e*r(A)) 


and (iv) 5(A®) = 5(A). 

Let 7 *(x) = (C'^^.y(x) I x) be the exact posterior content of the 7 -relative 

belief region. The following result generalizes results found in Wasserman (1989) 
and de la Horra and Fernandez (1994) who considered robustness to the prior 
of credible regions for the full parameter 0. In particular, this result applies to 
arbitrary parameters ^ = 4'(d) and does not require continuity. 

Proposition 2 The following hold, 

(i) among all sets A C 4* satisfying 11^ (A | x) < 7 *(x) and r{A) = r(4'), the 
set Cib_.^{x) minimizes 5(A), 

(ii) among all sets A C 4^ satisfying n,i, (A | x) > 7 *(x) and r(A®) = r(4'), the 

set minimizes 5(A), 

(Hi) when 7 *(x) = 7 > 1/2 then, among all sets A C 4' satisfying 11^ (A | x) = 
7 , the set Cq/^.y{x) minimizes 5(A). 

Proof, (i) For any set A with r(A) = r(4') then r(A®)—r(A) = r(A®)—r(4') < 0. 
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Therefore, 


^ {A\x) e*{r{A^)-rm e*r{^) 

(1 + e*r('I'))(l + e*r(j4'=)) l + e*r('I') 

^ I x) e*{r{A^) - r(vl/)) e*r(vl/) 

“ (1 + e*r('I'))(l + e*r(A‘=)) l + e*r(4')' 

Now 

r(A^) -r(^) 

1 + e*riA^) ^ ’ 

is increasing in so we need to show that A = C^f,^ry{x) minimizes r(A'^) 

among all A satisfying 11^ (A | a;) < 11^ | x) and r{A) = r(5'). Sup¬ 

pose that r(A'^) < r(C'|,^(x))and let B = '{ip : RB^jf^tplx) > r(A'^)}. Note 
that r(C'J^(x)) < -jix) RB^},{'lp\x) C^},^^{x) C B, which im¬ 
plies (_B I x) > ((^^^^(x) I x) with the strictness of the inequality fol¬ 

lowing from the definition of (^^^^(x). But also B C A which contradicts 
n<i, {A\x) < ((^^-^^(x) |x) and so we must have r{A^) > r(C'^^(x)). This 

establishes that is minimized by A = C^^^{x). 

(ii) Now consider all the sets A with r{A‘^) = r('I'). Since 5(A) = 5(A'^), 
it is equivalent to minimize 5(A'^) among all sets A° satisfying 11 ^ (A'^ \ x) < 
n<i- {C^ ^(x) I x) = 1 — 7 *(x) and r(A‘^) = r('I'). By part (i) this is minimized 
by taking A'^ = and the result is proved. 

(hi) The solutions to the optimization problems in parts (i) and (ii), namely, 
C^^^(x) and respectively, both have posterior content equal to 7 . As 

such one of these sets is the solution to the optimization problem stated in (iii). 
We have that 

5(C^,^(x)) - 5(C'J,i_^(x)) = 5(C^,.,(x)) - 5(C>i,,i_.,(x)) 

l(^*{r{C% ,^(x)) - r(«')) ie*{r{Cl^^_^{x)) - r(^')) 

“ (1 + e*r(vE-))(l + er{Cl,^{x))) (1 + e*r(vE-))(l + er{Cl,,_^{x))) 

It* j r{CI,^^ix))-ri'S) r(C'^ i_^(x))-r(4')] 

" (1 + eV(vE-)) I 1 + er{Cl^{x)) 1 + er(C^ ^.^(x)) / ' 

The result follows from this because ^{x) C i_^{x), so r{C^ ^{x)) < 
^{0^ i_.y(x)), and ® is increasing in r{A'^). m 

It is interesting to consider the statistical meaning of the separate parts of 
Proposition [5] as the statements create a degree of ambiguity. If a system of 
credible regions is being used, say B^^^{x), then it makes sense to require that 
these sets are monotonically increasing in 7 and the smallest set lim.ys^o B^ ,y{x) 
contains a single point which is taken as the estimate of ip. The size of B^ .y{x), 
for some specific 7 , can then be taken as an assessment of the accuracy of the 
estimate where size is measured in some application dependent way. The relative 
belief regions satisfy this and the estimate, under the assumption of a unique 
maximizer of RB^{- \ x), is ip{x). So effectively (i) is saying that (^^(^.^(x) is the 
most robust system of credible regions with respect to posterior content. Note 
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that we have to exclude sets A with II^, (A | x) > 7 * (x) because, for example, 
the set A = dt is always optimally robust with respect to content but does not 
provide a meaningful assessment of the accuracy of the estimate. Given that 
'ip(x) and the form of C-q,^j(x) are optimally robust, this further supports the 
claim that relative belief estimation is optimally robust to the choice of the 
marginal prior. Note that the sets in (ii) do not satisfy the stated criteria for 
being a system of credible regions. 

Part (iii) indicates that, when there are many sets with posterior content 
exactly equal to 7 , and this is typically true in the continuous case, then C^!,^j(x) 
is optimally robust among these sets with respect to content. It makes sense 
to require 7 *(x) > 1/2 for any credible region as, if j*(x) < 1 / 2 , then there is 
more belief that the true value is in CJ ^(x) than in C^I,^J(x). 

Applying Lemma 1 gives 



and this can be close to 1 when IlBg,(i/j(x) j x) is large. So, while C^^j(x) 
possesses an optimal robustness property with respect to posterior content, this 
does not imply that the posterior content is necessarily robust. This depends 
on other aspects of the particular problem which will be discussed. 

4 Measuring Robustness Quantitatively 

To measure the robustness of an inference to the prior tt, when using the e- 
contaminated class, it is natural to look at Gateaux derivatives of the relevant 
quantity at tt in various directions Q. The derivative is a measure of the sen¬ 
sitivity of the inference to small changes in the prior and so is local in nature. 
When the derivative is large for some Q, the inference is highly sensitive to the 
prior chosen and naturally this is viewed negatively. In this section this behav¬ 
ior of relative belief inferences is analyzed separately for e-contaminated classes 
for the marginal and the conditional 7r(-1 ^). 

4.1 Sensitivity to the Marginal Prior 

Consider the family of priors given by (jl]) but now restricted to those Q that are 
also absolutely continuous with respect to ly^ on 4 and let q denote the density 
of Q. The posterior of ip based on the contaminated prior is lie I x) = (1 - 
ea,)n^(-1 x) -I- CxQi- 1 x) where = emg (x) /[(I - e)m{x) + emq (x)],mQ(x) = 
m(x I Ip) Q{dip) and Q{A \ x) = f^('m(x | ip)/mQ{x)) Q{d\p). The relative be¬ 
lief ratio for \p based on a general He equals RB^^<s,{'tp | x) = (1 — ex)RBm{ip \ x) + 
exRBq^^^py I x) and here, using ([IJ, RBq^^^tp \ x) = m(x \ ip)/mq (x) so 


RBf^^i^{'ip I x) = 


RB^{'ip I x) 


( 6 ) 


1 — e(l — mq (x) /m(x)) 
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The following result gives the Gateaux derivative of the relative belief ratio. 


Proposition 3 The Gateaux derivative oj RB\i,{-\x) at ip in the direction Q 
equals 

RB^{ip\x) {1 - mg (x)/m{x)} . (7) 


Proof. From (jS]), 


RB%{ip I a:) - RB^{ip \ x) 

e^O e 


RB^{ip I x) lim 

e^O 


{l-niQ^x) /m{x)) ) 

1 - e(l - mg (x) /m{x)) / 


The value of ([7]) can be large simply because RB^{ip \ x) is large, so it makes 
more sense to look at the relative change as given by 1 — mg (x) jm (x). There¬ 
fore, for small e, 

I x) - RB^jip I x)\ ^ ’mgjx) 

RB,ii{ip\x) m{x) 

implying a small relative change in RB^i,(ip \ x) when mg(x)/m{x) is not large. 
The Gateaux derivative of the strength of the evidence is now computed. 

Proposition 4 The Gateaux derivative of Ilx^{RB^{ip \ x) < RB^ {ipo \ x) \ x) 
at ipo in the direction Q is 

"iQ (^) f QiRB,i,{ip\x) < RB<i,{ipo\x)\x)— 1 
m{x) \ U^iRB^{^\x) < RB^{ipo\x)\x) J 

Proof. The strength based on 11^ satisfies Il^{RB^^ii,{ip \ x) < I a:) | a;) 

= (1 - Cx) B^{RBe,^bi'>P I a;) < RB^^ii,{ipo \ x) \ x)+exQ\RB^^q,{ip \ x) < RB^^^{ipQ 
I cc) I a:). So, using dH), 

Il\,{RB^^^{ip I x) < RB^^^(ipQ I x) I a:) = n^(m(a: | ip) < m{x \ ipo) \ a;)-|- 

Cx {Q{m{x I Ip) < m{x \ ipo) \ x) - n^(m(a; | ip) < m{x \ ipo) \ a;)} . 

This implies that 

lim [n^ {RB^ {ipo I x) < RB% {ipo | x) | a;) - 11^ {RB^s, {ip\x) < RB^ {ipo\x)\x)]l e 

e —)-0 

= {Q{m{x I Ip) < m{x I Ipo) I x) - Ilq,{m{x \ ip) < m{x \ ipo) \ x)} 

m[x) 

_ mg (x) f Q{RB<s,{ip I x) < RBq,{ipo \ x) | x)— 1 
m{x) \ U^{RB^s,{ 1 p\x) < RB^s,{ 1 po\x)\x) J 

■ 

So the strength is robust to choice of the marginal prior whenever mg{x)/m{x) 

is small. 

For both the measure of evidence RBqi{ipo | x) and its strength, the ratio 
mg{x)/m{x) plays a key role in determining the robustness. The implications 











of this are discussed in Section 5. Note that supg mQ{x)/m{x) = RB{\p{x) \ x) 
gives the worst case behavior of this ratio. 

It is of interest to contrast these results with those for the commonly used 
MAP inferences which are based on the posterior density {■\x). 

Proposition 5 The Gateaux derivative of the posterior density of in the 
direction Q at ipo is given by {mQ{x)/m(x)}{q (ipo \ x) — (f/'o I 2;)}. 

Proof. Since (f/; | x) = (1 — ea;)7r^(V’ I x) + exq{'4’\x) it follows that 


TTe,^ (V’O I a;) - TT^ (V’O I a;) 
lim —^- 

e—S-O g 


mg jx) 
m{x) 


(9 (V'o I x) 


TT^ (V'o I a;)). 


Note that MAP-based inferences implicitly use (V’o I a;) as a measure of the 
evidence that tpo is the true value. Comparing this with the relative belief ratio 
we see that for small e, 

ke.'P i'lpolx) -Tr^{'ipo\x)\ _ mgjx) ^ _ q \ x) ^ 

7r^(V'o I x) m{x) TT^ (V'o I x) 

and the relative change in (^0 I x) is dependent on the ratio of the posteriors 
as well as mgix)/m(x) . So if tt^ (^0 I x) is small relative to q {-tpo \ x) we will get a 
big relative change and this suggests that MAP inferences are much less robust 
than relative belief inferences. A similar result is obtained for the Bayesian 
p-value in Evans and Zou (2001). 


4.2 Sensitivity to the Conditional Prior 

Consider now priors for 9 of the form He = [(1 —e)n(-1 ip) + eQ{- \ ip)] xll^ where 
Q {-1 f/') is a probability measure on 'i>~^{ip} absolutely continuous with respect 
to with density q{- \ ip), for each ip G 'it. So the marginal prior of ip is 

now fixed and the conditional prior of 9 is perturbed. The posterior of ip based 
on this prior is (■ | x) = (1 — e^) (-lx)-!- CxQ’ii (• | x) where (A | x) = 

fAimgix I ip)/mQ{x)) n,i,{dip), toq(x | ip) = fsix) Q{d9 \ ip) and rngi^x) = 

j^,'mQ{x\ip)Il,^{dip). 

The relative belief ratio for ip based on He equals RB,^ ,-i,{ip\x) = (1 - 
ex)RB^{ip I x) -I- exRBQ,ii/{ip \ x) where now I x) = mg^x \ ipo)/mg{x). 

This leads to the following result. 

Proposition 6 The Gateaux derivative of RB\i ,{-1 x) at ipo in the direction Q 
is {mg{x)/m (x)}(i?i?Q,^('i/'o I x) - RB^{ipo \ x)). 

Proof. Clearly, 

I I») , I *) - I »^)). 

e-i-0 g m[x) 


The implications of this result for robustness are discussed in Section 5. 
Now consider the robustness of the strength of the evidence. 
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Proposition 7 If \ x) has a discrete distribution with support containing 

no limit points, the Gateaux derivative o/| x) < | x) | x) at 

'ipo in the direction Q equals 0. When RBqf {-1 x) has a continuous distribution 
under 11^(• | x) with density g {-1 x), the Gateaux derivative o/11^(^/> | x) < 
(f/'o I 2 ;) I x) at ipo in the direction Q equals 

{mQ(x)/m {x))}RBQ^q,{ipo \ x)g{RB^,{'ipo \ x) | x). 


Proof. Since 

I x) < I 2;) I x) 

(1 - ex)RBq,{'ijj I x) + exRBQ^^{xl} \ x) 

< (1 - tx)RB^{iici I x) + I x) 

then, for all e > 0 such that < I, 

I x) < I a;) I x) 

< (rB^/IpI) I x) < RB^lpi^Q I x) + — RBQ^m{'ii)o \x)\x 

\ -L 

and for all e < 0, 

I x) < RB I x) I x) 

> n^, I RB^j/^ip I x) < RBqi^'ifo I x) + -— - — RBq^iirfifo | x) | x 

When RBq,{- 1 x) has a discrete distribution with support containing no limit 
points, then the lower and upper bounds equal 11^ (f/; | x) < I a;) | x) 

for all e small enough and the result follows. When RB\^{- \ x) has a continuous 
distribution with density g{- \ x), then 

n,!, (RB^ 'I' (V’o I x) < RB^ ,i,('0o|x)|x)—If;!, (RB^ (V’ I x) < RB^ (ipo lx) lx) 

lim-^^- 

e->o e 

= {niQ (x) /ni{x)}RBQ^q,{ipo \ x)g{RB^,{il)o \ x) \ x). 




From this it is seen that in the discrete case the strength is insensitive to local 
changes in the prior. 

Consider the continuous case. When there is strong evidence either for or 
against tpo, then | x) will be in the right or left tail correspondingly of 

the posterior distribution of RB<s,{- \ x) and so g{RB^/{'ftQ | x) | x) will tend to be 
small. As such the strength will be robust to small changes in the prior pro¬ 
vided mq (x) Imlx) is not large. When there is not strong evidence however, 
then g{RB,},{'ipQ | x) | x) could be large and, if mq (x) lm(x) is not small, then 
the strength is not robust. This underscores a recommendation in Baskurt and 
Evans (2013) that in the continuous case the parameter be discretized when 
assessing the evidence and its strength. For this, when f) is real-valued, let 
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(5 > 0 be the difference between two ip values that is deemed to be of practical 
importance. The prior and posterior distributions of ip discretized to the inter¬ 
vals [ipo + (2f — l)6/2,'ipo + (2i -b l)<J/2) for f e Z are then used to assess the 
hypothesis corresponds to the interval [^o — <5/2, + ^/2). By Proposition 0 

the strength is then insensitive to small changes in the prior. 

It is perhaps not surprising that the robustness behavior of the relative 
belief ratio and its strength is more complicated when considering the effect 
of the conditional prior than with the marginal prior. The optimality results 
concerning robustness to the marginal prior underscore this. 


5 Robustness and Prior-Data Conflict 

The existence of a prior-data conflict means that the data support certain val¬ 
ues oi Ip = '^(0) being the true value but the prior places little or no mass 
there. While various measures can be used to determine whether or not such 
a conflict has occurred, a logical approach is based on the factorization of the 
joint probability measure for (0, x) given by 11 x Pg = n(-1 T) x Mt x P(- | T), 
where T is a minimal sufficient statistic, !!(• | T) is the posterior probability 
measure for 9, Mt is the prior predictive probability measure of T and P(-1 T) 
is the conditional probability measure of the data given T. The measure P(-1 T) 
is then available for computing probabilities relevant to checking the model 
{fg : 9 G 0}, the measure Mt is available for computing probabilities relevant 
to checking the prior and !!(• | T) is the relevant probability measure for com¬ 
puting probabilities for 9. A statistical analysis then proceeds by checking the 
model, perhaps via a tail probability based on a discrepancy statistic, and then 
proceeding to check the prior if the data does not contradict the model. If both 
the model and prior are not contradicted by the data, then we can proceed to 
inference about 9. The logic behind this sequence lies in part with the fact that 
it makes no sense to check a prior if the model fails. Furthermore, separating 
the check of the prior from that of the model provides more information in the 
event of a conflict arising, as it is then possible to identify where the failure lies, 
namely, with the model or with the prior. 

In Evans and Moshonov (2006) this factorization was adhered to and the 
tail probability 

MT{mT{t) < mT{T{x))) ( 8 ) 

was advocated for checking the prior where rriT is the density of Mt with respect 
to some support measure. So if (|5]) is small, then the observed value T{x) of the 
minimal sufficient statistic lies in the tails of Mt and there is an indication of 
a prior-data conflict. In Evans and Jang (2011a) the validity of this approach 
was firmly established by the proof that ([5]) converges to n(7r(0) < TT{9true)) 
under i.i.d. sampling and some additional weak conditions. Furthermore, it was 
shown how to modify (|S]) so as to achieve invariance under choice of the minimal 
sufficient statistic. Also, Evans and Moshonov (2006) argued that ([U should be 
replaced by MTpniTit) < mT{T{x)) \ U{T{x))) for any maximal ancillary U{T) 
as the variation in T due to to U{T) has nothing to do with 9 and so reflects 
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nothing about the prior. The tail probability (|S]) is a check on the full prior and 
Evans and Moshonov (2006) also developed methods for checking factors of the 
prior so a failure in the prior could be isolated to a particular aspect. 

First, however, consider the case when d>(0) = 9 and interest is in the robust¬ 
ness of inferences to the whole prior. From the results in Section 031 it is seen 
that the ratio mQ{x)/m{x) = mQ^T{T{x))/mT{T{x)), where mQ^T(T(x)) = 
Jq fe,T("r(x)) Q{d9), plays a key role in determining the local sensitivity in the 
direction given by Q, of the inferences for given observed data x. This depends 
on Q and the worst case is given by 


«T-Q,r(r(x)) Jq fe,T(T{x))Q{d9) 

/rri w = - 7¥rT\ - 

Q mT{T{x)) Q mT{T{x)) 


= RB{9{x) I x) 


(9) 


and note that 9(x) is the MLE in this case as well as the relative belief estimate. 
Notice that when (|5|) is small, so there is an indication of a prior-data conflict 
existing, then ?7i7’(T(x)) is relatively small when compared to other values of 
mT{t) which are not influenced by the data. This implies that the prior is having 
a big influence relative to the data and so a lack of robustness can be expected. 

This phenomenon is well-illustrated in the following examples where ancil- 
laries play no role because of Basu’s theorem. 

Example 1. Location normal model. 

Suppose that x = (xi,..., x„) is a sample from the iV(/i, 1) distribution with 
/r ~ fV(/ro,cro). Then Mt is given by T{x) = x ^ + cr^). When Q is 

the distribution, then Mq^t is given by x ^ iV(/ri,l/n -|- af). This 

implies that 

'mQ,T{T{x)) ^ 1 1/n -b gg f_l {l/n + a1)~^ {x - 

mT{T{x)) y 1/n-bcr^ [i j n + al)~^ {x - 

and, as a function of this is maximized when /ii = x,a\ = 0. Notice 

that this supremum converges to oo as x ^ ±oo and such values correspond to 
prior-data conflict with respect to the N(pLQ,a‘J) prior. 

Now, consider a numerical example. A sample of size n = 20 was generated 
from the fV(0,1) distribution obtaining x = 0.2591. When the base prior is 
N{0.5, 1) then ([S]) equals 0.8141 and accordingly there is no indication of any 
prior-data conflict. Also, supg (mQ(x)/TO(x)) = 4.7109 which seems modest 
as it describes the worst case robustness behavior. In Table 1 some values of 
77iQ(x)/m(x) are recorded when Q is a distribution for various values 

oi and af as these might be expected to be realistic directions in which to 
perturb the base prior. In all cases the value of mQ{x)/m(x) is quite modest 
and the maximum value of (fTUl) is 1.0534. Overall it can be concluded here that 
the analysis is robust to local perturbations of the prior. 

Now consider an example where there is prior-data conflict. In this case a 
sample of n = 20 is generated from a iV(4,1) distribution obtaining x = 4.0867 
and the same base prior is used. The value of ([U is 0.0005 and so there is 
a strong indication of prior-data conflict. Furthermore, supg (mQ(x)/m{x)) = 
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Ml 


mQ{x)/m{x) 

Ml 

c^i 

mQ{x)/m{x) 

-3.0 

1 

0.0065 

0.5 

0.5 

1.3474 

-2.0 

1 

0.0905 

0.5 

1.0 

1.0000 

-1.0 

1 

0.4832 

0.5 

2.0 

0.7254 

1.0 

1 

0.7917 

0.5 

3.0 

0.5975 

2.0 

1 

0.2428 

0.5 

50.0 

0.1488 

3.0 

1 

0.0287 

0.5 

100.0 

0.1053 


Table 1: The ratio mQ{x)/m{x) in Example 1 when there is no conflict. 


Ml 


mQ{x)/m{x) 

Ml 


mQ{x)/m{x) 

-3.0 

1 

1.88 X 10-« 

0.5 

0.5 

0.0053 

-2.0 

1 

9.97 X 10-“ 

0.5 

1.0 

1.0000 

-1.0 

1 

2.00 X 10-^ 

0.5 

2.0 

14.2070 

1.0 

1 

4.90 X 10-1 

0.5 

3.0 

32.5842 

2.0 

1 

5.75 X 10^ 

0.5 

50.0 

58.2823 

3.0 

1 

2.61 X 10^ 

0.5 

100.0 

43.9565 


Table 2: The ratio mQ{x)/m{x) in Example 1 when there is conflict. 


2096.85 which certainly indicates a lack of robustness. In Table 2 some values 
of mQ{x)/in(x) are recorded when Q is a distribution for various 

values of /ri and af. It is seen that the value of mQ{x)/m{x) can be relatively 
large and the maximum value of (ITOll is 468.86. So it can be concluded that the 
analysis based on the model, prior and observed data, will not be robust to local 
perturbations of the prior when there is prior-data conflict. ■ 

Example 2. Bernoulli model. 

Suppose that x = (xi,..., Xn) is a sample from a Bernoulli(6*) and the prior 
is 9 ^beta(ao,/3o) for some choice of (ao,/3o)- A minimal sufficient statistic is 
— Sr=i ** Binomial(n, 0) and then 

(f\ - r(Qo -f /3o) r{t + Q;o)r(n -t + Po) 

[tjTiaoWo) r(n + ao + /3o) ' 


Also, 


sup {mQ{x)/m{x)) 
Q 


r(ao)r(/3o) r(n -I-qq-I-/ gp) 
r(ao + Po) r(t -I- ao)r(n - t + Po) 


To illustrate the relationship between prior-data conflict and robustness, 
consider a numerical example. Suppose that ag = 5 and Pq = 20. Generating 
a sample of size n = 20 from the Bernoulli(0.25) gave the value nx = 3. In 
this case ([8]) equals 0.7100 and there is no indication of any prior-data conflict. 
Also, supg {mQ{x)/m{x)) = 1.4211 which indicates that the inferences will be 
generally robust to small deviations. If mQ{x)/m{x) is computed for various Q, 
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ai 


mQ{x)/m{x) 

ai 

Pi 

mQ{x)/m{x) 

20 

5 

32647.89 

5 

1 

21523.28 

15 

5 

25729.50 

5 

25 

0.12 

10 

5 

15010.95 

5 

22 

0.41 

5 

5 

3996.37 

5 

20 

1.00 

1 

5 

125.87 

5 

16 

6.77 


Table 3: The ratio mQ{x)/m{x) in Example 2 when there is conflict. 


where Q is a beta(Q!i,/3i), then in all cases it is readily seen that this ratio is 
quite reasonable in value as indeed it is bounded above by 1.4211. 

A sample of n = 20 was also generated from a Bernoulli(0.9) with the same 
prior being used. In this case nx = 17 and ([5]) equals 6.2 x 10“®, so there is a 
strong indication of prior-data conflict. Also, supg (mQ{x)/m{x)) = 46396.43 
which indicates that the inferences will be generally not be robust to small 
deviations. Table 3 provides some values of mQ{x)/m(x) for Q given by a 
beta(ai,/3i) for various choices of (ai,/3i) and there are several large values. ■ 

Now consider the case when 0 = (0i,02) S 0i x 02 so the prior factors 
as 7r(0) = 7r2(02 | Presumably the conditional prior 7r2(-1 9i) and the 

marginal prior tti are elicited and the goal is inference about some '0 = 

It is then preferable to check the prior by checking each individual component 
for prior-data conflict as this leads to more information about where a conflict 
exists when it does. 

In general, it is not clear how to check the individual components but in 
certain contexts a particular structure holds that allows for this. Suppose that 
all ancillaries are independent of the minimal sufficient statistic and so can be 
ignored. The more general situation is covered in Evans and Moshonov (2006). 

As discussed in Evans and Moshonov (2006), suppose there is a statistic 
V (T) such that the marginal distribution of E(T) is dependent only on 9i. Such 
a statistic is referred to as being ancillary for 9^ given 9i. Naturally we want 
V(T) to be a maximal ancillary for 02 given 9i. An appropriate tail probability 
for checking tti is then given by 

Mv(T){mv(T){v) <mv(T)iy{T{x)))), (11) 

as My(j) does not depend on 7r2(- | 0i). A natural order is to check tti first and 
then check 7r2(- |0i) for prior-data conflict, whenever no prior-data conflict is 
found for tti. The appropriate tail probability for checking 7r2(' | 0i) is given by 

MrimTit I V{T{x))) < mT{T{x) \ V{T{x))) \ V{T{x))). (12) 

Note that this is assessing whether or not 712 (■ |0i) is a suitable prior for 02 
among those 0i values deemed to be suitable according to the prior tti. If (fT^ 
were to be used before (EH), then it would not be possible to assess if a failure 
was due to where tti was placing the bulk of its mass or was caused by where 
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the conditional priors were placing their mass. Notice that 


mQ^T{T{x)) _mQ^T{T{x)\V{T{x)))mQy(T){V{T{x))) 
mT{T{x)) mT{T{x)\V{T{x))) myiT){V{T{x))) ’ ^ ^ 

so prior-data conflict with either tti or 7r2(-1 0 i) could lead to large values of the 
ratio on the left for certain choices of Q. When only the conditional prior of 02 
given 01 is perturbed, then mq y(T[x))) = my( t){V(T{ x))). 

Letting denote the density of V, then 

mQy(T)iViT{x))) 

myt^T){V(T{x))) 

= L tyvlrt))) I 

where RBi{- \V{T{x))) gives the relative belief ratios for 0i based on having 
observed V{T{x)). The right-hand side gives the worst-case behavior of the 
second factor in (fT^ . 

Now consider the robustness of relative belief inferences for a general ip = 
T (0). The following result generalizes Propositions [3] and [6] as we consider a 
general perturbation to the prior, namely, lie = (1 — e)n + eQ and the proof is 
the same as that of Proposition [6] 

Proposition 8 The Gateaux derivative of RBq,(- \x) at ip in the direction Q is 
{mQ{x)/m {x)}{RBQy{ip \ x) - RBq,{ip \ x)). 

The factor RBQy{ip\x) — RBsi,{ip\x) can be big simply because we choose 
a prior Q that is very different than H. For example, RB^{ip \ x) may be big 
(small) because there is considerable evidence in favor of (against) ip being the 
true value and we can choose a prior Q that doesn’t (does) place mass near ip. 
As such, it makes sense to standardize the derivative by dividing by this factor 
and this leaves the robustness determined again by mQ{x)/m (x). 

Suppose now that Q and 11 have the same marginal for c = S(0). Then, 
mQ{x) = m (x) RB(d | a;) Q{d0 \ ip) nH((ic) < m (x) f~ RB{9^{x) \ x) 

11= (dc) where O^^^x) = argsup{i?i3(0) | x) : S(0) = Therefore, 

< /i?i?(0.(x)|a;)n=(*) (14) 

m(x) 

and the the right-hand side gives the worst-case behavior of the first factor in 
(USD when 5(0) = 0i which is related to prior-data conflict with the prior on 02. 
The following is a standard example where priors are specified hierarchically. 

Example 3 Location-scale normal model. 

Suppose that x = (xi, ..., Xn) is a sample from the N{fj,,a'^) distribution 
with ^|cr^ ~ N{^o,TQa'^),a~^ ~ gammar-ate(ao,/3o)- Then T{x) = (x,||x- 
xllp) is a minimal sufficient statistic for the model. Note that the prior is chosen 
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by eliciting values for /io, Tq , oq , /3o and so there is interest in how sensitive infer¬ 
ences are to perturbations in each component separately. The posterior distri¬ 
bution of (/r, tr^) is given by fi \ a'^,T{x) ^ N{^x, (n + ^/tq) ^ ct^), cr“^ | T{x) ~ 
gammarate (<ao + n/2, /3{x, s^)) where Hx = {n+l/T§) ^(ux+ixo/tq) and I3{x,s'^) 
= Po + (n — l)s^/2 -|- n(x — /ro)^/2(nrQ -I- 1) with = ||a: — a;l|P/ (n — 1). 

Consider first inferences for ijj = dt(d) = and note that V(T{x)) = ||a: — 
ail IP is ancillary given ip and its distribution depends on ip. Therefore, the prior 
on cr^ is checked first using the prior predictive for V (T(x)). An easy calculation 
gives that the prior distribution of = V{T(x))/{n — 1) is (/3o/ao)-F(n —1, 2 ao) 
and this specifies (HU. While the results of Section 4.1 apply here, consider 
the behavior of the relative belief ratio RBi{a^ | t4(T(a;))) which is based on 
only observing V{T{x)) rather than T{x). By Proposition [3] this has Gateaux 
derivative depending on mQy(T){V{T{x)))/mv(T){y{T{x))). Notice, however, 
that relative belief ratios accumulate evidence in a simple way. For any statistic 
V{T{x)), then 


RB^{iP\T{x)) 


TT-i,{lp\T{x)) 

-K^{lp) 


T^^{lp\V(T{x))) TT-i,{lp\T{x)) 
TT-^ilp) 'K-i,(lp\V{T{x))) 


where the first factor gives the evidence obtained after observing V{T{x)) and 
the second factor gives the evidence obtained after observing T(x) having al¬ 
ready observed V(T{x)). So RBi{a^ \ x) = RBi{a‘^ \ V(T{x)))[RBi{a'^ \ T{x))/ 

I P(T(x)))] with the same interpretation for the factors. As such, a 
lack of robustness of RBi{a^ \V{T{x))), which can be connected to prior-data 
conflict through (HU, implies a lack of robustness for RBi{a^ \ x). 

When no prior-data conflict is obtained for the prior on cr^, then it makes 
sense to look for prior-data conflict with the prior on /i which is typically the 
parameter of primary interest. So now consider perturbations to the prior 
on /i and the relationship to prior-data conflict with this prior. The con¬ 
ditional distribution of T[x) given V{T{x)) is given by the conditional prior 
predictive of x given which is distributed as /ro -I- dtn+ 2 ao-i where = 
{tq (nTg -I-1) (2/3o + {n- l)s^) -I-1} /{utq (n -I- 2ao - 1)} specifying (HU- Fur¬ 
thermore, for (HU, and 6 * 0-2 (x) = (x,(t^) with 


RB{{x, a^) I x) 
l)i 


[riTn 


r(ao) 


/3o" 


r(ao + n/2) 


(^(x,s 2 ))“«+^ 


exp 


(n — l)s^ I 

J 


and so 


RB{{x,a‘‘)\x)Ili{da ) = {nTQ+l)^ 


P{x,s^) 


, + {n — l)s 2/2 


^“ 0 - 1-1 


Now consider a number of numerical examples where the base prior is always 
specified by /tq = 0, Tq = 1, ao = 5 and /3o = 5. The behavior of the two factors 
in (HU is examined when there is no prior-data conflict and when there is. 
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ai 

Pi 

'n^Q,V(T){y{T{x)}) 

■mv(T){V(T{x))) 

ai 

/3i 

'^Q,V(T){y{T{x}}} 

mvmiViTix))) 

5 

1 

0.05 

1 

5 

0.07 

5 

2 

0.38 

2 

5 

0.25 

5 

4 

0.99 

4 

5 

0.81 

5 

10 

0.34 

10 

5 

0.53 


Table 4: The ratio mqy(T){y{T{x)))/my (t){V {T{x))) in Example 3 when 
there is no conflict with the prior on 


ai 

Pi 


Qfl 

Pi 

"lQ,V(T)(V(r(x))) 

■^v(T)iV{T{x))) 

mv(T){V{T{x))) 

5 

1 

0.00 

1 

5 

5517.42 

5 

2 

0.01 

2 

5 

1245.26 

5 

4 

2.34 

4 

5 

13.78 

5 

10 

23.51 

10 

5 

0.00 


Table 5: The ratio mqy( t){^{T{ x)))/my( t){V{T{ x))) in Example 3 when 
there is conflict with the prior on . 


A sample of size n = 20 was generated from the N{Q, 1) distribution obtain¬ 
ing X = —0.1066, = 0.9087. So there should be no prior-data conflict with the 

prior on cr^. Indeed, m equals 0.7626 so there is no indication of any problems 
with the prior on a^. Values oimqy(^T){y{T{x)))/my( t){V{T {x))) are recorded 
in Table 4 when the marginal prior on cr^ is perturbed by a gammarate(cn,/3i) 
distribution for various values of ai and /3i. In all cases, the ratio is small 
and indicates robustness to local perturbations of the prior on . Note that 
the worst case behavior, over all possible directions, is given by the maximized 
relative belief ratio for based on V{T{x)) which occurs at and equals 


RBi{s^\V{T{x))) 


r(ao) 


r(ao + (n-l)/2) 







— l)s^ 


2 



In this case RBi{s'^ \ V{T{x))) = 1.7479. 

Next a sample of size n = 20 from the V(0, 25) was generated obtaining 
X = 0.0950, = 23.9593. So there is clearly prior-data conflict with the prior 

on . This is reflected in the value of (HID which equals 0.64 x 10“®. Table 
5 shows that there is a serious lack of robustness. The worst case behavior is 
given by RBi{s^ \ V{T{x))) = 40484.68. 

It is also relevant to consider what happens concerning the robustness of 
inferences about cr^ when there is prior-data conflict with the prior on /r but 
not with the prior on cr^. A sample of n = 20 was generated from the V(10,1) 
distribution obtaining x = 9.7041, = 1.0082, so there is clearly prior-data 

conflict with the prior on fj, but not with the prior on cr^. The value of m 
equals 0.6460 which gives no reason to doubt the relevance of the prior on a^. 


17 




























0-1 

/3i 

^Q,V(T){y{T{x)) 

mv(T){V{T(x)) 

Qfl 

/3i 

fnQ,V(T){y{T{x}} 

mviT){V{T{x)) 

5 

1 

0.03 

1 

5 

0.09 

5 

2 

0.29 

2 

5 

0.31 

5 

4 

0.92 

4 

5 

0.86 

5 

10 

0.44 

10 

5 

0.38 


Table 6: The ratio mqy (t){V {T{x))) / my (t){V {T{x))) in Example 3 when 
there is conflict with the prior on /i but not with the prior on cr^. 


Ml 

r? 


Ml 



mT(T(x)\V(T(x))) 

mT{T{x)\V{T{x))) 

-2 

1 

0.17 

0 

2 

0.51 

-1 

1 

0.66 

0 

3 

0.34 

1 

1 

0.54 

0 

4 

0.26 

2 

1 

0.12 

0 

5 

0.21 


Table 7: The ratio mQy[T{x)\V{T{x))))/mT{T{x)\V{T{x))) in Example 3 
when there is no conflict with the prior on or with the prior on 


Table 6 shows that mQy{T){y{T{x)))/mv(T){{T{x))) is small and indicates 
robustness to local perturbations of the prior on cr^. The worst case behavior 
is given by RBi{s^ \ V{T(x))) = 1.7218. This reinforces the claim that the tail 
probabilities (HH and (HH) are measuring different aspects of the data conflicting 
with the prior. 

Now consider perturbations to the prior on /r with the prior on fixed. A 
sample of n = 20 was generated from a A^(0,1) obtaining x = —0.1066, = 

0.9087 so there is clearly no prior-data conflict with either component. This 
is reflected in the value of m which equals 0.9150. Table 7 shows that the 
first factor mQy{T{x) \ V{T{x)))/mT{T{x) \V{T(x))) in (fT^ is small when the 
conditional prior on /i is perturbed by N{^i,Ti) priors and thus demonstrates 
robustness to perturbations in these directions. The worst case behavior is given 
by /o°°-R-B((x,cr2) |x) Ili{da = 4.6099 which is comparatively small. 

Table8gives some values oimQy{T{x) \ V{T{x)))/mT{T{x) \ V{T{x))) when 
a sample of n = 20 was generated from a N{0, 25), obtaining x = 0.0950, = 

23.9593. So in this case there is prior-data conflict with the prior on cr^ but not 
with the prior on /i. The value of (HID equals 0.9150 which gives no indication 
of prior-data conflict with the prior on /r. The tabulated values also indicate no 
serious robustness concerns as does RB((x, cr^) | x) Hi (dcr”^) = 4.5838. This 
also reinforces the claim that the tail probabilities m and dUD are measuring 
different aspects of the data conflicting with the prior. 

Table9gives some values oimQy{T{x) \ V{T{x)))/mT{T{x) \ V{T{x))) when 
a sample of n = 20 was generated from a N{10, 1) obtaining x = 9.7941, = 

1.0082. So in this case there is prior-data conflict with the prior on /i but not 
with the prior on cr^. The value of (TT^ equals 0.1691 x 10“® which gives a 
clear indication of prior-data conflict with the prior on In this case the tab- 
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Ml 


mQ,T{T{x)\V[T{x))) 

mT{T{x)\V(T{x))) 

Ml 


mQ^Ti:i\x)\V{T{x))) 

mT(T(x)\V(T(x))) 

-2 

1 

0.87 

0 

2 

0.51 

-1 

1 

0.96 

0 

3 

0.34 

1 

1 

0.98 

0 

4 

0.26 

2 

1 

0.90 

0 

5 

0.21 


Table 8: The ratio mQ^TiT{x) \V(T{x))))/mT{T{x) \V{T{x))) in Example 3 
when there is conflict with the prior on but not with the prior on /x. 


Ml 


mQMT(x)\V(T(x))) 

Ml 



mT(T{x)\V{T{x))) 

mT{T{x) 1 V(T(x))) 

-2 

1 

0.01 

0 

2 

117,584 

-1 

1 

0.10 

0 

3 

5,611,980 

1 

1 

10.83 

0 

4 

26,012,609 

2 

1 

132.09 

0 

5 

55,478,630 


Table 9: The ratio mQ^T{T{x) \V(T{x))))/mT{T{x) \V{T{x))) in Example 3 
when there is no conflict with the prior on but there is with the prior on /x. 


ulated values indicate a clear lack of robustness with respect to the prior on 

Also, RB{{x, cr^) I x) XVi{da~'^) = 8,046, 933,962 indicates that the worst 
case behavior with respect to robustness is terrible. 

6 Conclusions 

Several optimal robustness results have been derived here for relative belief 
inferences. These and other results suggest a natural preference for these infer¬ 
ences over other Bayesian inferences for estimation and hypothesis assessment. 
Even though relative belief inferences may be the most robust to choice of prior, 
this does not guarantee that they are robust in practice. The issue of practi¬ 
cal robustness in a given problem is seen to be connected with whether or not 
there is prior-data conflict. With no prior-data conflict the inferences are robust 
to small changes in the prior, at least in the sense measured here. This adds 
support to the point-of-view that checking for prior-data conflict is an essential 
aspect of good statistical practice. 

It is interesting that the worst case behavior of the measure of sensitivity 
is associated with the maximized value of a relative belief ratio. The actual 
maximum value attained is meaningless, however, as there is no way to calibrate 
this as opposed to calibrating the relative belief ratio at a fixed value via the 
strength. The relative belief estimate is consistent, however, and the relative 
belief ratio at this value will, at least in the continuous case, converge to infinity. 
So large values would seem to be associated with high evidence in favor. What 
has been shown here is that large values can be associated with prior-data 
conflict and a lack of robustness rather than providing high evidence. When 
prior-data conflict is encountered the prior can be modified, following Evans 
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and Jang (2011b), to avoid this. While objections can be raised to taking such 
a step, it seems necessary if we want to report a valid characterization of the 
evidence obtained. 

In Baskurt and Evans (2013) the relationship between relative belief ratios 
and Bayes factors is examined. Both serve as measures of evidence but the 
relative belief ratio is a simpler, more direct measure and it has many nice 
mathematical properties. It is the case too that often relative belief ratios and 
Bayes factors agree. For example, in the case of continuous priors, when the 
Bayes factor at a point is defined as a limit, then these quantities are the same. 
As such, it is reasonable to expect that the results derived here about relative 
belief inferences will apply equally well to inferences based on Bayes factors. 
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Appendix 


Proof of Lemma [T] Note first that 

Ja rnq{x) 


(15) 


Therefore, using m, 

(1 - e) m{x)Ti^ {^\x) + emgjx) Q{A\x) 

— e)m{x) + emq{x) 

Hm, (A I x) -k Xi I V') Qidi’) 

1 + [Ja IV') Qid-ip) + Ja- I V') Qidip)} 

^ n-p {A\x)+ X 4 I Qjdip) 

1 + (l-.)m(.) L I Q^dlP) 


and the last inequality is an equality when Q{A‘^) = 0. Result (i) then follows 
since (p+y)/(l + y) = 1 ~ (l~p)/(l+y) is increasing in y > 0 when 1—p > 0 and 
clearly supg m{x \ -ip) Q{dip) = lim^j^o X 4 m{x \ ip) Qs{dip) = supy,g^ m{x \ ip) 
where Qs places all of its mass on the set {ip : m{x \ ip) > sup^g^ m{x \ ip) — 
S}nA. 

For result (ii) we have that 


Hm, iA\x)+ X 4 IV') Qid-ip) 

1 I V') Q(dp’) + sup^g^c m(x I Ip)) 

{A I x) 

1 + (i-.)m(x) snp^^A- rn{x \ iP) 


where the first inequality is obvious and the second follows since (p -k y)/(l -k 
y-k5) = 1 — ( 1 —p-k5)/(l + y+5) is increasing in y > 0 when 1 — p-k 6 > 0 and 
so the minimum is attained at y = 0. The inequalities are equalities whenever 
Q{A) = 0. We then argue as in (i). 
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For (iii) a direct calculation yields 
diA) = iA\x)- (AI x) 

_ {A I x) (1 + e*r{A^)) + £*r(A)(l + e*r(A°)) -U^{A\ x) (1 + e*r(A)) 

(1 + e*r(A))(l + e*r(A‘^)) 

(A I x) e*(r(A‘^) — r(A)) e*r{A) 

(1 + e*r(^))(l + e*r(^'=)) 1 + e*r(A) ’ 

Result (iv) follows from (5(A°) = supq(1 —11^ {A \ x)) —infQ(l —n|, (A | x)) = 
supQ(-n|, {A I x)) - infQ(-n|, {A \ x)) = 5(A). 
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